Module 11: Cross-validation and the Control of Error Rates
نویسنده
چکیده
This module emphasizes what might be termed “the practice of safe statistics.” The discussion is split into three parts: (1) the importance of cross-validation for any statistical method that relies on an optimization process based on a given data set (or sample); (2) the need to exert control on overall error rates when carrying out multiple testing, even when that testing is done only implicitly; (3) in the context of “big data” and associated methods for “data mining,” the necessity of some mechanism for ensuring the replicability of “found results.”
منابع مشابه
Tracking cross - validated estimates of prediction error as studies accumulate ∗
In recent years “reproducibility” has emerged as a key factor in evaluating applications of statistics to the biomedical sciences, for example learning predictors of disease phenotypes from high-throughput “omics” data. In particular, “validation” is undermined when error rates on newly acquired data are sharply higher than those originally reported. More precisely, when data are collected from...
متن کاملFactor Structure of the Smoking Temptation Scale: Cross-Validation in Iranian men
Background: The transtheoretical model (TTM) is used as a framework to implement smoking cessation programs. This model has some subscales based on which the smoking temptation scale is proposed as stages movement factor. This study aimed to translate and validate the temptation subscales of the TTM questionnaire in the Iranian population. Methods...
متن کاملLong-term Streamflow Forecasting by Adaptive Neuro-Fuzzy Inference System Using K-fold Cross-validation: (Case Study: Taleghan Basin, Iran)
Streamflow forecasting has an important role in water resource management (e.g. flood control, drought management, reservoir design, etc.). In this paper, the application of Adaptive Neuro Fuzzy Inference System (ANFIS) is used for long-term streamflow forecasting (monthly, seasonal) and moreover, cross-validation method (K-fold) is investigated to evaluate test-training data in the model.Then,...
متن کاملClassification based upon gene expression data: bias and precision of error rates
MOTIVATION Gene expression data offer a large number of potentially useful predictors for the classification of tissue samples into classes, such as diseased and non-diseased. The predictive error rate of classifiers can be estimated using methods such as cross-validation. We have investigated issues of interpretation and potential bias in the reporting of error rate estimates. The issues consi...
متن کاملDetermining optimal value of the shape parameter $c$ in RBF for unequal distances topographical points by Cross-Validation algorithm
Several radial basis function based methods contain a free shape parameter which has a crucial role in the accuracy of the methods. Performance evaluation of this parameter in different functions with various data has always been a topic of study. In the present paper, we consider studying the methods which determine an optimal value for the shape parameter in interpolations of radial basis ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015